Skip to content

Add fix_task to verifier.json: mechanical-vs-authority repair routing#146

Merged
pengfei-threemoonslab merged 3 commits into
mainfrom
feat/verifier-fix-task
May 30, 2026
Merged

Add fix_task to verifier.json: mechanical-vs-authority repair routing#146
pengfei-threemoonslab merged 3 commits into
mainfrom
feat/verifier-fix-task

Conversation

@pengfei-threemoonslab
Copy link
Copy Markdown
Contributor

What

Adds fix_task to verifier.json — the single, deterministic repair instruction a verify run hands to whoever must act next. Stacked on #145 (the verdict-contract lock); review that first.

Why

A merge verdict of blocked / human_review_required is just friction unless it also says what to do next and who can safely do it. fix_task is that contract, and its routing encodes the product's core safety boundary: a coding agent may fix mechanical gaps, but an authority gap — approval/idempotency evidence it cannot prove, a weakened policy, a touched trust root — must route to a human so the agent cannot invent its way to green (reward hacking).

Routing (a pure projection of the head scan, never a model judgment)

Condition actor safe_to_attempt
mergeable — (no fix_task)
every gating finding autofix_safe coding_agent true
any gating finding requires_human_review human false
policy_weakened / trust_root_touched human false
insufficient_evidence / unknown human false

Routing is by the per-finding autofix_safe signal, not the verdict label — a blocked-but-mechanical PR can still route to the agent, and a review-required PR with an authority gap routes to a human.

Changes

  • VerifierFixTask schema (actor, safe_to_attempt, instructions[], forbidden_shortcuts[], verification_command) with a validator enforcing actor="human"safe_to_attempt=False (a human-authority task can never be marked agent-safe).
  • cli/verify/fix_task.py build_fix_task() — deterministic projection; forbidden_shortcuts are the anti-reward-hacking guardrails (no suppression, no severity-lowering, no inventing evidence, no weakening the policy that evaluates the change).
  • Orchestrator wiring; first_next_action.actor now routes through the same fix_task so the two agent-facing signals can't disagree.
  • PR comment renders fix_task as the authoritative "Required before merge" block (falls back to the prior path when absent).
  • Regenerated docs/verifier-schema.v0.1.json (additive optional field).
  • tests/test_fix_task_contract.py (13 tests).

Verification

  • Full suite: 2316 passed, 4 skipped, 0 failed
  • python scripts/generate_schemas.py --check: clean
  • ruff check: clean

🤖 Generated with Claude Code

pengfei-threemoonslab and others added 3 commits May 29, 2026 22:41
…n guards

The verify cycle (M1-M3) already computes one release verdict in
build_release_decision() and projects it onto the report summaries and the
agent-facing merge_verdict, but the discipline was enforced only by
convention and docstrings. This makes it structural.

- Define ReleaseDecisionStatus once in schemas/common.py and reuse it for
  AgentSummary.verdict, ReviewerSummary.verdict, VerifierVerdict, and
  ReleaseConsequence.decision (previously four hand-respelled Literals of
  the same vocabulary). Generated JSON schemas are byte-identical
  (generate_schemas.py --check clean) - no wire change, no schema bump.
- Type _DECISION_TO_VERDICT as dict[ReleaseDecisionStatus, MergeVerdict]
  and add a totality test, so a new release status without a mapping fails
  CI instead of silently falling back to human_review_required.
- Add a VerifierArtifact model_validator: when a head release_decision is
  present, merge_verdict and the decision copy MUST be exact projections of
  it - an inconsistent artifact is impossible to construct.
- Centralize the no-decision verdict rule in merge_verdict_for(); delete the
  divergent inline _merge_verdict in the orchestrator (summaries defaulted
  to "passed", verify defaulted to "mergeable"/"unknown" - now one rule).
- Add tests/test_verdict_contract.py pinning canonical-enum reuse across all
  verdict surfaces, projection totality + the exact table, the fail-safe
  (unknown status never auto-passes), and the validator.

Full suite: 2302 passed, 4 skipped. Behavior unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…repair routing

fix_task is the single repair instruction a verify run hands to whoever acts
next. Routing is a pure projection of the head scan - never a model judgment:
a coding agent may fix mechanical gaps (every gating finding is autofix_safe),
but any authority gap (approval/idempotency evidence it cannot prove, a
weakened policy, a touched trust root, or degraded evidence) routes to a human
so the agent cannot invent its way to green.

- Add VerifierFixTask {actor, safe_to_attempt, instructions, forbidden_shortcuts,
  verification_command} with a validator: an actor='human' task can never be
  marked safe_to_attempt (the anti-reward-hacking guarantee).
- Add cli/verify/fix_task.py build_fix_task(): projects release_decision plus
  per-finding autofix_safe / requires_human_review into the task; policy_weakened,
  trust_root_touched, and insufficient_evidence force the human route.
- Wire fix_task into the verifier artifact and route first_next_action.actor
  through the same fix_task so the two agent-facing signals never disagree.
- Render fix_task as the authoritative "Required before merge" block in the PR
  comment (falls back to the prior agent-summary path when absent).
- Regenerate docs/verifier-schema.v0.1.json (additive optional field).
- Add tests/test_fix_task_contract.py.

Full suite: 2316 passed, 4 skipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… single-source next action

Addresses review feedback on #146:

- Fail closed in the coding-agent route: require every gating finding to be
  explicitly mechanical (autofix_safe is True AND requires_human_review is
  False). A finding with None/False routing fields (stale/plugin/legacy) is now
  treated as an authority gap and routed to a human, never silently marked
  safe_to_attempt.
- Shell-quote the refs in verification_command with shlex.quote (a valid git
  ref can contain ';', so the unquoted command was injectable); render it
  through the backtick-stripping _code helper so the PR comment stays
  Markdown-safe.
- first_next_action borrows the agent-summary action only when its implied
  actor agrees with fix_task; otherwise actor, command, and why are all derived
  from fix_task so the two agent-facing signals cannot contradict.
- Emit a human fix_task for a non-mergeable verdict with no head report
  (unknown), so the routing table holds and every non-mergeable verdict carries
  a fix_task.

Full suite: 2323 passed, 4 skipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Base automatically changed from feat/verdict-contract-lock to main May 30, 2026 06:32
@pengfei-threemoonslab pengfei-threemoonslab merged commit 6c2c416 into main May 30, 2026
1 check passed
pengfei-threemoonslab added a commit that referenced this pull request May 30, 2026
* Surface the self-approval prohibition at the top of verifier.json

When a PR weakens the release policy or touches a trust root, a coding agent
must not silently self-approve a change to its own gate. That prohibition was
only present inside a fix_task instruction (PR #146); promote it to the two
fields an agent reads first.

- Add _self_approval_note(): the explicit "a coding agent cannot self-approve
  that change - a human must review it" message for policy_weakened (taking
  precedence) and trust_root_touched.
- verifier.json headline leads with the note when present.
- human_review.why leads with the note, and a self-approval note forces
  human_review.required=True regardless of the verdict path.

Full suite: 2346 passed, 4 skipped. No schema change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Self-approval: keep all top-level convenience fields consistent (review fix)

Addresses review of #148: a self-approval note forced human_review.required=True,
but can_merge_without_human and first_next_action still keyed only off
merge_verdict, so the defensive (mergeable + note) path could emit "human review
required" and "safe to merge" at once.

- _can_merge_without_human returns False whenever a self-approval note exists.
- _first_next_action routes to a human review (never the "safe to merge" action)
  when a self-approval note is present, including the fix_task-None defensive
  case.
- Both thread capability_review from _build_verifier. Clean mergeable behavior
  (no note) is unchanged; covered by a regression test.

Full suite: 2349 passed, 4 skipped. No schema change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant